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Abstract — The uniqueness theorem for a two-parameter ex- 
tended relative entropy is proven. This result extends our previ- 
ous one, the uniqueness theorem for a one-parameter extended 
relative entropy, to a two-parameter case. 



I. Introduction 

Shannon entropy [1] is one of fundamental quantities in 
classical information theory and uniquely determinded by 
the Shannon-Khinchin axiom or the Faddeev axiom. One- 
parameter extensions for Shannon entropy have been studied 
by many researchers [2]. The Renyi entropy [3] and the Tsallis 
entropy [4] are famous. In the paper [5], the uniqueness 
theorem for the Tsallis entropy was proven. See also the paper 
[6] and the references therein, for the axiomatic characteri- 
zations of one-parameter extened entropies. Recently, a two- 
parameter extended entropy was studied by several researchers 
[7], [8], [9], [10], [11], [12] and the uniqueness theorem 
for a two-parameter extended entropy was proven in [12] by 
generalizing the Shannon-Khinchin axiom. A two-parameter 
extended entropy is defined by 



(3 — a 



for two real numbers a and j3 such that < a < 1 < (3 or 
< j3 < 1 < a. If we take a = 1 or (3 = 1, then it recovers 
the Tsallis entropy defined by 



S q (xi 



3 = 1 



1 



(l/!>0). 



The Tsallis entropy recovers Shannon entropy 

n 

i=i 

in the limit q — > 1. 

In this paper, we study on information measure (entropy) 
defined for two probability distributions. The relative entropy 
(Kullback-Leibler information or divergence) is defined for 
two probability distributions X = {xi, ■ ■ ■ , x n } and Y = 

{Vl, ■ ■ -,Vn}- 

n 

D^X^s^Xj Qogxi-logyj). 



Since Shannon entropy is defined for one probability distribu- 
tion and it can be reproduced by the relative entropy as log n — 
Di(X||?7) for the uniform distribution U = {1/n, • • • , 
the relative entropy can be regarded as a generalization for 
Shannon entropy. We here note that we have one-parameter 
extended relative entropies such as the Renyi relative entropy 
D*(X\Y), a-divergence D^ a \X\\Y) and the Tsallis entropy 
Dj(X\\Y). These are defined by 



1 " 
D?(X\\Y) = — -]og J^xyr", 



D {a) (X\\Y) = 



I -a 2 



3 = 1 



D^XWY)^-^ 



a 1 — q 

x 3 -x)y 3 



for q ^ 1 and a ^ ±1. These quantities recover the relative 
entropy in their limit q — > 1 or a — > ±1. These quantities are 
also essentially same one in the sense that 



D {q) {X\\Y) = -D*{X\\Y), 
D R {X \\Y) = ^- 



(9- 



(3^0,1), 
l)D T q {X\\Y)} 



where we set q = in D^ q \X\\Y). Here, we note that the 
form X)J=i x< jyj 9 i s a PP eare d in all one-parameter extended 
relative entropies. Therefore it was sufficient to study one 
quantity of them, for the study of a one-parameter extension 
of the relative entropy. Thus the uniqueness theorem for the 
Tsallis relative entropy was proven in our previous paper [14]. 

In the present paper, as a further extension of our previous 
result, we give a two-parameter extedned axiom for the func- 
tion defined for any pairs of the probability distributions and 
prove the uniqueness theorem for a two-parameter extended 
relative entropy. This paper is organized as follows. In Section 
2, we review the uniqueness theorem for relative entropy 
proven by A.Hobson, and the uniqueness theorem for a one- 
parameter extended relative entropy. In Section 3, we show 
our main theorem. In Section 4, we characterize the constant 
appeared in Section 3. In Section 5, we give properties for a 
two-parameter extended relative entropy. 
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II. Review of the uniqueness theorem for 

ONE-PARAMETER EXTENDED RELATIVE ENTROPY 

The uniqueness theorem for relative entropy was shown by 
A. Hobson as follows [13]: 

Theorem II.l ([13]) The function Di(A\\B) is assumed to 
be defined for any two probability distributions A = {aj} 
and B — {bj} for j — 1. ■ ■ • 

following conditions (R1)-(R5), then it is given by the form 

k Y^j=i a j 1°8 IT w * m a positive constant k. 
(Rl) Continuity: Di(A\\B) is a continuous function of 2n 

variables. 
(R2) Symmetry: 

Di (oi, • • • ,ctj, ■ ■ ■ ,ctk, ■ ■ ■ , o„||6i, • • • ,bj, ■ ■ ■ , bk, ■ ■ ■ , b n ) 
= Di (oi, • • • ,ctk, ■ ■ ■ ,dj, ■ ■ ■ , a n \\bx, ■ ■ ■ , bk, ■ ■ ■ ,bj, ■ ■ ■ , b n ) ■ 
(R3) Additivity: 

D\ (an, • • • j aim, a2i, • • • , a 2m 

| l&llj ' ' ' ; b\m, 021, • • • , 02m) 

= Di (ci,c 2 \ \di,d 2 ) 



Then, we have the following theorem. 

Theorem II.3 ([14]) If conditions (OR1), (OR2) and (OR3) 
hold, then D q (A\B) is given in the following form: 



If £>i(A||S) satisfies the w j m a certam constant 4>(q) depending on the parameter < 

III. Uniqueness theorem for two-parameter 

EXTENDED RELATIVE ENTROPY 



(2) 



+c y D 1 
+c 2 D 1 



an 
ci 

021 
C2 ' 



aim 

Cl 
C2 



hi 
dl : 
021 



Olm 

~~a\ 

»2m 

^7 



In our previous paper [14], we gave Axiom III. 21 in order 
to characterize the Tsallis relative entropy (one-parameter 
extended relative entropy). In this section, we prove the 
uniqueness theorem for a two-parameter extended relative 
entropy. 

Theorem III.l If the function D a ^(X\\Y), defined for any 
pairs of the probability distributions X = {x±, ■ ■ ■ ,x n } and 
Y = {yi, ■ ■ ■ ,y n } on a finite probability space, satisfies the 
conditions (TR1)-(TR3) in the below, then D a ^(X\\Y) is 
uniquely given by the form 

8 1-/3 

civ, 

(3) 

"in. >) 

3=1 



71 x a v 1 ~ a 
D a AX\\Y) = J2^- 



4>{a,f3) 



with a certain constant <fi(a, (3) depending on two parameters 
a and j3. 



where Cj = YJjLi a tj and d i = Y%Li b v- 
(R4) D^AllB) = If cu = bj for all 

/r>«\ r-i/i in mi 1 l v • e (TR1) Continuity : D a g (xi x n Wvi Vn) is a continuous 

(R5) Di{±, ~, 0, 0||^, ^) is an increasing func- ' . . _ _ ,pv . '. . ' n||yl ' ' yn/ 

tion of no and a decreasing function of n, for any integers 

n and no such that no > n. 



function for 2n variables. 



(TR2) Symmetry : 

a „ ^ , u a, • • ■ ,3;^-, ■ • • , Xfc, ■ • • , a;„||yi, • ■ • , ■ ■ • , y„) 

As a one-parameter extention, we gave the uniqueness the- 
orem for the Tsallis relative entropy as follows. The function = D a ^(xi, ■ ■ ■ , xu, • • • , Xj, ■ ■ ■ , a; n ||j/i, • • • , j/fe, • • • ,yj, ■ ■ ■ , y n ). 
D q is defined for the probability distributions A = {aj} and (XR3) Additivity : 
B = {bj} on a finite probability space with one parameter 



q > 0. The one-parameter extended relative entropy (Tsallis 
relative entropy) was characterized by means of the following 
triplet of the generalized conditions (OR1), (OR2) and (OR3). 



Axiom II.2 ([14]) 



yu,---,yim, ■ ■ ■ ,y n i, - ■ ■ ,y nm ) 



,W n ) ( ~ 
3 = 1 V 



(OR1) Continuity: D q (a\,- 


■ ,a n \\h 


■ ■ ■ , b n ) is a continuous 


function of 2n variables. 




(OR2) Symmetry: 






D q (ai, • • • , aj, ■ ■ ■ , a,k, ■ ■ ■ , 


a n \\bi, ■ ■ 


• j bj, ■ ■ ■ ,bk, ■ ■ ■ , b n ) 


= D q (ai, • • • , ajfc, • • • , Oj, • 


■ , a n \\bi 


■ ■ ■ , bk, ■ ■ ■ ,bj, ■ ■ ■ , b n ) ■ 


(OR3) Additivity: 






D q (an, • ■ • , ai m , • ■ 


■ , a„i, • • 


i a nm 


||0ll, • • ■ , Olm, 


■ • ' j b n l, 


) bnm) 


= D q (a, ■ ■ -,c n \\di 


• • • , d n ) 




n , 

+Y,ct4-«D q ( a f,. 


dim 

• m 7 

Ci 


bil bim \ ,j, 

di' di J ' 


where = YljLi a v 


and di = 





: D at p(zi, ■■■,Z n \\wi ■■ 

Ea 1—a 7-1 / *1 Xi m 



Wi 

Vim 
Wi 



1-8 



(4) 



where z l = 2^ J= i x ij and w i = Z^j=i Vij- 
Proof: From (TR2), we have 
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From (TR3), we also have 



From (TR2) and (TR3), we then have 



D, 



a,/3 



1 



= S 



n /3 /1X 1-/J 



S \t 



1 

0,---,0 



u) \v 



D, 



a,0 





1 








SU 


1 




to 


'"' to ) 
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l 




o,- 
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1 


7 


0,- 



1 



a,/3 



1 1 



1 


1 


n ' 

E 

fe=i 


' n 

E ifc 
fe=i 


o,---,o 


i 

n 

E m fe 



o,- ••,(),••• 



' n ' ' n i 

E ^ E ^ 

fe=i 



fe=i 



'"-ft / 

fe=l / 



From above two equations, we have 
,1 1 1 

D, 



= D a .j3 (Zl , ■ ■ ■ , Z n | \w X , • • • , W n ) ^2 



= 1 



IV ( 1 



h ) V mi 
1 



1-/3 



OL.J3 I i i ) ; 5 i ; 

1 SU SU SU SU 



0, •■■,(), •■■,(), •■■,() 



1 



S\l-/3 /l 1 

D a ,p - -,0,---,0 

/ u \ i— Q / 1 I 

+ - D Qi/J -,0,---,0 

Vw/ V s s 



1 1 

; * * * ) 

V V 

1 1 



»=i ^ 1 (j 

since s^- = for j = Zj + 1, • ■ • , m. Thus we have 

A*,/3 (2i,---,Z„||wi,---,W„) 



n n 



fc,0 ( E 'ki E m k - E z f w 4 %,/? ft, mi) 

vfe=l fc=l / i=l 



£(*)"(£) 



If we put 

fa,p(s,t) =D a .j3 f 0, - •, 



1 1 , 



i hi 



1-/3 
1-/3 



then we have 

fa t p(su,tv) = y-J fa,p{u,V) + [-) f a ,p (s,t) . 

We also have 



1-/3 



1-/3 



f a ,p(uS,Vt) = [-) / a>/3 (S,i) + (jj f a ,p(u,v), 

putting s = u,u = s,t = v and u = t in the above equation. 
From above two equations, we have 



Here we have 



/a,/3 (S, t) 

Therefore we have 



fa,0 (U,V) 



<j>(a,/3) . 



Er 1—r 
z t w t 

n 

= E 



l-r 





r 


f \ 


f ^ 






n 

E it i 

\fc=i / 




n 

i E m fe 
\fc=i / 



l-r 



/a,/3 (M) = 



._(!) -(!) 



1-/3 



<j>{a,/3) 

For two natural numbers Zj and m t such that Zj < rnt, we 
put 

-j^— ,(i = !,••-, n), m t = ™ % , (i = !,••■, n) 



fc=l 

n 

i E ™fc , 

\fe=i / 



,(r e 



for 



and 



E E m k 

fe=i fe=i 



— n ' 1' ' ' ' ' n i 3 — 1) ' ' ' ) ^t) ) 

E i* 

fe=i 

y« = — — . (* = !> •■■>«; i = !>•■■ ,™.) • 



^ = "ir— , (« = !,•■•,«) , Wi = „ m " , (i = 1, • • • ,n) . 

E l k E m-fe 

fe=i fe=i 

Thus we have 



(zi,---,z n \\wi,---,w n 

1-/3 n 



E m fe 
k=i 



i (ir) -t«-'(i) 

i=l v 7 i=l v 7 



1-/3 



(^) 



1-/3 
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Since we can take U and arbitrary, we may take U — I and 
rrii = m, then we have 



of the numerator of the right hand side in Eq.© is 
equal to 0: 



A*,/3 (zi, - ■ ■ ,z n \\wi, - ■ ■ ,w n ) 



E ^>l~ a - E *?wt- p 

i=l i=l 

6 (a, [3) 

From (TR1) and the fact that any real number can be ap- 
proximated by a rational munber, the above result is true for 
any positive real number Zj and Wj satisfying Ej=i z j 
E"=i = !■ 



3=1 



/3 i-/ 3 ^ n 



Putting /? = 1 and a = q in the above theorem, we have 
the uniqueness theorem for a one-parameter extended relative 
entropy (Theorem 111.31 . 

IV. Characterizations of 6(a, 13) 

In this section, we characterize the constant 6(a, (3) depend- 
ing on two parameters a and f3. 

Proposition IV.l The postulate that our quantity 

D ai p(xx,---,x n \\yx,---,y n ) defined for any pairs of 
the probability distributions: 

" x a y 1 ~ a - x^y 1 ^ 13 
D at p(x u - ■ ■ ,x n \\yx, - ■ ■ ,Vn) X! ■ ' 



Therefore we have lim/3_> a 6(a, (3) = 0, other- 
wise ]im.^ a D a ^(xx,---,x n \\yi,---,y n ) takes 0, 
which contradicts the Eq.©. From the reason why 
we have the limit of the left hand side in ©, 
we also have (a, (3) ^ for a 7^ (3, since 

n 

E ^vf a - x lv) * for a * P- 

j=l 

(c2) Since V a;, — x^y^~^ is differentiable by (3, we need 

3=1 3 3 

that there exists an interval (a, b) such that 0(1, (3) is also 
differentiable by (3 on (a, 1) U (1, b), in order that we have 
the limit of the left hand side in Eq.©. By the similar 
way, there exists an interval (a, b) such that 0(a, 1) is 
also differentiable by f3 on (a, 1) U (1, b). 
(c3) Since we have 

lim D lt p (xx, - ■ ■ ,x n \\yi, - ■ ■ ,y n ) 

8 1-13 



3=1 



0(a,/?) 



lim > 

0(1,/?) 



(5) 

derived in Theorem IIII.ll recovers the relative entropy when 
a — ► 1 and /? — > 1, that is, 



lim ■ 

(3^1 



E a^2/] ( l ogXj - \ogyj) 
3=1 

rf<Ki,ff) : 

d/3 



lim D Qj/3 (xi, • • • ,x n \ \yi,- ■ -,y n ) = k Vi, (logXj - log 

(6) 

implies the following conditions. 

(cl) We have lim 0(a, 1) = lim 0(1,/?) = lim 6(a,B) — 

a— >1 /3— »1 /3— +a 

and (a, /?) ^ /or a ^ /?. 

(c2) There exists the interval (a, 6) such that 0(a, 1) and 

0(1,/?) are differentiable on (a, 1) U (1, b). 
(c3) There exists the constant k > such that lim ^j-"' 1 ) = 

1 a nd lim = -i. 

k ^ ^ dp k 

Proof: 

(cl) We may calculate the limit of the left hand side in Eq.© 
in the following ways. 

(i) Firstly we may take the limit a — > 1 in Eq.© and 
then later take the limit (3 — » 1: 

a i-p 
Vj 



yj) there exists a constant k > such that d< ^'^ = — p By 

the similar way, there exists a constant fc > such that 

rf0(q,i) _ 1 
dd k- 



Proposition IV.2 D a ,a{X\\U) takes the minimum value for 
fixed posterior probability distribution as uniform distribution 



n p 1- 

— oJV 



Since we have lim V (xj — x^y] = 0, we 
i9->i jri V J J 3 / 

need lim 6(1,8) = in order that we have the 

,8-fl 

limit in the above. 

(ii) By the similar way to (i), we have lim (a, 1) = 0. 

a— >1 

(iii) Firstly we may put (3 — > a and then later take the 
limit a — ► 1. In the case /?—>«, the summation 



1 U. „ /l J 
n a ) \ a 11 

when we have 

(c4) the following relations (i) and (ii) for a and j3 

(i) a±p. 

(ii) If (a, (3) > 0, then we have < (3 < 1 < a. If 
(a, /?) < 0, then we have < a < 1 < (3. 

Proof: The second derivative of D a> g (xi, ■••,!„ 1 1 > ' " ' > ) 
on Xj is calculated by 

d 2 D a ,f} (xi, ■ ■ ■ ,x n || ■ ■ ■ , \ ) 

dx 3 

n a - l a (a - 1) a;"" 2 - n' 3 " 1 /? (/? - 1) x^' 2 

= HaJ) 
This takes positive value in the case of (c4) so that it should 
be convex in xj. Theorefore D a ^(X\\U) takes the minimum 
value. 



5 



V. Properties of a two-parameter extended 

RELATIVE ENTROPY 

As an example satisfying the conditions (cl)-(c4) on 
4>{a, (3), we simply take <p(a, (3) = a — (3. Then we may define 
a two-parameter extended relative entropy in the following. 

Definition V.l For two parameters a,(3 6 1 satisfying < 
a<l</3orO</?<l<a:, and two probability 
distributions X = {xi, ■ ■ ■ ,x n } and Y = {y±, ■ ■ ■ ,y n }, we 
define a two-parameter extended relative entropy by 



n oc 1 — a 

D a , (X\\Y)^Y.- Vj 



a- /3 



Note that a two-parameter extended relative entropy is a 
generalization of the relative entropy in the sense that 

lmxD a>0 (X\\Y)=D 1 (X\\Y). 

We also note that a two-parameter extended relative entropy 
recovers the Tsallis relative entropy (one-parameter extended 
relative entropy) when a = 1 or f3 = 1. The Tsallis relative 
entropy is also a one-parameter genealization of the relative 
entropy: 

limD^(X\\Y)=D 1 (X\\Y). 

In addition, we note that a two-parameter extended relative 
entropy is expressed by the convex combination of the Tsallis 
relative entropy: 



D a .AX\\Y) 



a - 1 



1-/3 



-D T a {X\\Y) + ^Dj(X\\Y). (7) 

a — (3 a — (3 ' 



Thus we have the following properties on a two-parameter 
relative entropy, thanks to the above relation and the properties 
of the Tsallis relative entropy studied in [15]. 

Proposition V.2 For a two-parameter extended relative en- 
tropy D at p(X\\Y), we have the following properties. 

(i) (Nonnegativity) D a ,p(X\\Y) > 0. 

(ii) (Symmetry) 

D a 

■3 (^-7r(l) i ' ' ' ? x ir{n) \ \ Vn{l) ; ' " ' j V-ir(n) ) 

= A*,/3 (xi, ■ ■ ■ ,x n \\yi, •••,?/„). 

(iii) (Possibility of extention) 

D a .j3 (xi, ■ ■ ■ , x n , ||yi, • • • , y n , ) 
= D ai p (xi, ■ ■ ■ ,x n \\yi, ■ ■ ■ ,y„ ) . 

(iv) (Joint convexity) For < A < 1 and the probability 
distributions X« = ja^j.yW = [yf], (i = 
1, 2; j = 1, ■ • • , n), we have 

D a ^ (\X (1 ^ + (1 - A) X^ I \\ Y W + (1 - A) Y^ \ 

< \D a , (X« | |F« ) + (1 - A) D a , f3 (x^ I |yW ) . 

(v) (Monotonicity) For the transition probability matrix W, 
we have 

D a . p (WX\\WY) <D aJJ (X\\Y). 



VI. Concluding remarks 

As we have seen in Section ITTT] the two-parameter extended 
relative entropy is characterized by continuity, symmetry and 
additivity. On the other hand, it is known that the /-divergence 
is characterized by symmetry, monotonicity and joint convex- 

ity [16]. 

Closing this section, we give the expressions of a two- 
parameter extended relative entropy by means of /-divergence: 



Df(X\\Y) 



where / is a convex function on (0, oo) and /(l) = 0. If we 
take f{f) — ilogt, then /-divergence Df(X\\Y) recovers the 
relative entropy. Here, if we put 



a — [3 



(8) 



then d2f 2S {t) >OforO<a<l</3orO</3<l<a. 
And then we have the following expression: 

D a , p (X\\Y) = D fa jX\\Y). 

The /-divergence is often defined by 



D f ,(X\\Y) 



3=1 



where / is a convex function on (0, oo) and /(l) = 0. If we 
take f*(t) — —logt, then /-divergence Dt*(X\\Y) recovers 
the relative entropy. Here, if we put 



f l-a _ +1-/3 

fc tfl (t) = —, (a + 13), 

a — [3 



(9) 



then d2/ ° t f (t) > for < a < 1 < (3 or < (3 < 1 < a. 
Indeed, if the function f(t) is convex on (0, oo), then the 
function g(t) = tf (i) is also convex on (0, oo), because of 
the elementary calculation: 

d 2 g(t) i <Pf{\) 

dt 2 i 3 dt 2 ' 
And then we have the following expression: 

D atls (X\\Y) = D f .jX\\Y). 

It is known that the dual of /-divergence: 

D}(X\\Y)=D f (Y\\X) 

has the relation 

D* f (X\\Y) = D f ,(X\\Y), 

if we have f*(t) = tf{\). (For example, see [17] for details.) 
In the case of two-parameter extended relative entropy, the 
above relation holds: /* Jt) = tf a ^(^). Through the concept 
of duality in the field of information geometry, we find the 
relation between Eq.® and Eq.©. 
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